Reliable telemetry

ABSTRACT

A system and methods for reliable telemetry are disclosed herein. In an example of reliable in-band telemetry in a communications network, intent information for a destination device may be generated at a network device indicating a type of telemetry data to be collected. The network device may update a locally stored invertible Bloom function (IBF) by applying one or more hash function to the intent information, a destination identifier (ID) associated with the destination device, and/or a local timestamp, and periodically forward the locally stored IBF to the destination device. The network device may receive a notification message by the destination device that the intent information is missing at the destination device and re-forward the intent information to the destination device. In another example, a network device may maintain and periodically forward a locally stored IBF based on response data and the destination ID.

CROSS REFERENCE TO RELATED APPLICATIONS

This application claims the benefit of U.S. Provisional Application No. 62/528,964, filed Jul. 5, 2017, which is incorporated by reference as if fully set forth herein.

FIELD OF INVENTION

The disclosure relates generally to a system and method for reliable telemetry.

BACKGROUND

Network telemetry involves the use of automated tools and processes designed to collect measurements and other data at points throughout the network, which can then be used for network monitoring and performance analysis.

In an example telemetry solution, the in-band network telemetry (INT) framework for packet networks is implemented in the data plane such that telemetry information is carried in data packets (e.g., in the header of data packets) and can be modified with each hop. The data plane refers to the part of a device's architecture that makes routing decisions for incoming packets. For example, routing may be determined by the device using a locally stored table in which the device looks up the destination address of the incoming packet and retrieves the information needed for forwarding.

The INT framework relies on programmable data planes to bring flexibility to telemetry data collection. Devices with programmable data planes include network processors or general-purpose central processing units (CPUs) at the low end, and data path programmable switch chips at the high end. With INT, a source switch (or more generally, a source network device) incorporates an instruction header to collect network state information as a part of the data packet. Intermediate INT-capable switches (devices) interpret the instruction header and collect and insert the desired network state information or responses in the data packet, which eventually reaches a sink switch and can be used as needed to monitor and evaluate the operation of the network. Advantages of INT include real-time telemetry rates, low CPU and operating system (O/S) overhead, and the flexibility to programmatically instrument packets to carry useful telemetry data.

In another example telemetry solution, the packet-optical in-band telemetry (POINT) framework provides in-band telemetry data for end-to-end correlation of collected network state data in mixed networks with multiple network layers, such as packet-optical networks. According to the POINT framework, a source device inserts an intent (POINT intent) instruction for telemetry data collection into the data flow. The intent communicates the parameters of data collection such as conditions for data collection, entities being monitored, and/or the type of data to be collected for that flow. Intermediate devices on that data flow process the high-level intent if it is targeted towards them, translate the intent into a suitable device-specific action for data collection and execute that action to collect a response.

The degree to which telemetry data can be depended on to be accurate, in other words the reliability of the data, is an important aspect of telemetry applications. For example, for the POINT framework, it is desirable to maintain reliable intent information and response information. While hop-by-hop reliability of intent/response information may use data path reliability mechanisms focused on data loss occurring along a network link (e.g., forward error correction (FEC), checksums, sequence numbers (SNs)), the end-to-end reliability of intent information may not be covered by data path reliability mechanisms. For example, intents and responses may be lost due to queuing employed at different levels (e.g., ingress/egress, layer boundary) in the devices that may not be covered by the mechanisms for detecting loss on a link. In fact, existing reliability solutions for telemetry applications may rely on best effort communications for the intent transfer across layers, and the response communication across layers. Thus, solutions for providing hop-by-hop reliability, end-to-end reliability, data integrity (i.e., the maintenance and assurance of the accuracy and consistency of data), and reliability of response aggregation are desirable for telemetry applications, including the INT and POINT frameworks.

SUMMARY

A system and methods for reliable telemetry are disclosed herein. In an example of reliable in-band telemetry in a communications network, intent information for a destination device may be generated at a network device (e.g., source device), such that the intent information indicates a type of telemetry data to be collected along a network path to the destination device. The network device may update a locally stored invertible Bloom function (IBF) by applying one or more hash function to the intent information, a destination identifier (ID) associated with the destination device, and/or a local timestamp. The network device may periodically forward the locally stored IBF to the destination device. The network device may receive a notification message generated by the destination device that the intent information is missing at the destination device based at least in part on the locally stored IBF, and re-forward the intent information to the destination device. In another example, a locally stored IBF based on response data and the destination ID may be maintained at a network device and periodically forwarded to the destination device. In an example, the disclosed reliable telemetry system and methods may be used in a packet-optical in-band telemetry (POINT) framework designed for gathering multi-layer telemetry data, which may be used in packet-optical networks.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a high-level diagram of an example packet-optical in-band telemetry (POINT) framework implemented in an example packet-optical network, in accordance with the disclosures herein;

FIG. 2 is a high-level illustration of the set difference problem for two hosts (e.g., two network devices);

FIG. 3 is an overview of the procedure for creating an IBF data structure;

FIG. 4 is an overview of the procedure for calculating the IBF difference;

FIG. 5 is a flow diagram of an example procedure for reliable routing of intent performed by a network device that originates the intent in a POINT framework implemented in a communications network, in accordance with the disclosures herein;

FIG. 6 is a flow diagram of an example procedure for reliable routing of response data performed by a network device that generates response data in a POINT framework implemented in a communications network, in accordance with the disclosures herein; and

FIG. 7 is a block diagram of a computing system in which one or more disclosed embodiments may be implemented.

DETAILED DESCRIPTION OF THE EMBODIMENTS

In the following, telemetry applications are described in more detail and examples are given of the disclosed reliability system and method as used in telemetry applications.

Streaming telemetry mechanisms, such as OpenConfig, are designed to streamline the notification of a network state by having the network elements stream the telemetry data up to a central management entity where the data gets stored and processed. While streaming telemetry mechanisms employ extensive offline algorithms to process telemetry data, they are not designed to inherently improve the quality of the data collected. As explained above, the INT framework relies on programmable data planes to bring flexibility to telemetry data collection. The programmable data planes used in INT have been explicitly designed for packet networks; however, extending INT mechanisms into optical networks, where there is no notion of data packets, is far from straightforward due to factors such as layering and the presence of purely analog devices.

The emergence of integrated packet and optical networks, or “packet-optical networks”, such as those interconnecting data centers, see additional challenges when it comes to network telemetry because of the different types of telemetry data collected in packet versus optical networks. For example, the telemetry data collected in a packet layer of a packet network, such as packet loss and latency, on a per-flow basis cannot be easily attributed to or correlated with data collected in the optical layer of an optical network, such as a bit error rate (BER) and quality factor (Q-factor). Moreover, an optical network lacks the digital constructs used by telemetry solutions such as INT, and the packet layer does not have access to measurements in the optical network. A further challenge occurs in associating packet flow telemetry data with the corresponding data from optical transport network (OTN) layers, which involves piecing together telemetry data from many devices.

Optical parameters may affect traffic flows. For example, if a link experiences degradation in Q-factor without link failure, operators can use knowledge of the degradation to proactively move critical applications away from the affected link. Thus, it is useful for network operators to be able to monitor optical parameters over time and use optical telemetry information in routing decisions and other applications.

Thus, the packet-optical in-band telemetry (POINT) framework was developed (as described in U.S. patent application Ser. No. 15/801,526, which is incorporated herein by reference in its entirety) to achieve end-to-end correlation of collected network state data in mixed networks with multiple network layers, such as packet-optical networks.

According to the POINT framework, a source device inserts intent information (i.e., POINT intent) for telemetry data collection along with the data flow. The intent communicates the parameters of data collection, such as conditions for data collection, entities being monitored, and the type of data to be collected for that flow. Intermediate devices on that data flow process the high-level intent if it is targeted towards them, translate the intent into a suitable device-specific action for data collection and execute that action to collect an intent response. At a layer boundary, such as a packet to optical boundary, or across optical layers such as a hierarchy of optical data units (ODUs), intermediate devices translate the intent and response using a layer-appropriate mechanism. For example, in the packet network, the intent and response may be encapsulated using IP options or VXLAN metadata header. At the packet-optical boundary, the intent can be retrieved from the packet header, and translated and encapsulated as ODU layer metadata, which remain accessible to all nodes along the end-to-end path of the ODU.

In another example, the POINT intent can be translated into an appropriate query for telemetry data collection via the management plane of the optical devices. As soon as the response of data collection is ready, it is communicated through the optical network and translated appropriately into a packet or packet header at the packet-optical boundary and forwarded to the sink for analysis. For example, the response communication may be out-of-band using the optical supervisory channel (OSC). The POINT framework also supports adding response metadata for incorporating deployment-specific reliability mechanisms.

Thus, the POINT framework provides hierarchical layering with intent and response translation at each layer boundary, and mapping of the intent to layer-specific data collection mechanism, such that the POINT framework can be deployed across a network layer hierarchy. The POINT framework also provides for fate sharing of telemetry intent and data flow. Telemetry data for a specific data flow can be collected in-band as the data traverses the network layers. By design, intent responses can be out-of-band to accommodate scenarios such as troubleshooting networks when there is no connectivity between the source and the sink. Additionally, intents, which are high level instructions for data collection, can be mapped to existing data collection mechanisms between two POINT capable intermediate network devices.

FIG. 1 is a high-level diagram of an example POINT framework 100 implemented in an example packet-optical network 102, in accordance with the disclosures herein. The example packet-optical network 102 includes packet devices 110, 112, 114, 116, 118 and 120, and an optical network 104 segment that includes optical devices 122, 124, 126, 128, 130 and 132. The POINT framework 100 can operate over an optical network 104 with Layer-0 (L0) and/or Layer-1 (L1) circuits. The packet devices include a POINT source device 110 and a POINT sink device 120, as well as packet optical gateways (POGs) 114 and 116 located at the interfaces between the packet segments and optical network 104 segment of the packet-optical network 102. The packet devices 110, 120, 114 and 116 can operate at the packet layer, for example at layer 2 (L2)/layer 3 (L3) (e.g., L2 may be a data link layer and L3 may be a network layer, which exist above a physical layer). POGs 114 and 116 are also connected via lower layer devices, such as L1/L0 devices 122, 124, 126, 128, 130 and 132. In the example packet-optical network 102, POGs 114 and 116 and optical devices 126 and 128 are configured as POINT intermediate devices (i.e., devices with digital capability to interpret POINT intent, translate it across layers, and aggregate and propagate the telemetry state information in the packet-optical network 102).

According to the POINT framework 100, telemetry information for a packet-optical traffic flow 105, such as intent or POINT data (e.g., intent and response), in the packet-optical network 102 is gathered in the data plane 140 as part of the information carried in the network 102, as described below. The telemetry plane 160 represents the telemetry information for the packet optical flow 105 being mapped and correlated across network layers, constructs (e.g., secure network communications (SNC), label-switched path (LSP), or virtual local area network (ULAN)) and devices operating at different layers in the networking stack to give the end user (e.g., at the POINT sink) a unified view of the operation of the entire packet-optical network 102.

In accordance with the disclosed POINT framework 100, a POINT source device 110 may initiate a network telemetry data collection for a packet-optical flow 105 along a packet-optical data path from the source device 110 to a sink device 120. Along the packet-optical data path, POINT intermediate devices, such as POGs 114, 116, and optical devices 126, 128, may interpret the intent, collect the desired telemetry data, and encode it back into the packet (flow) 142, which eventually gets forwarded to the sink device 120. For example, as packet (frame) 142 traverses the packet-optical network 102 across devices and layers (e.g., packet layers L2/L3 and optical layers L1/L0) in the data plane 140, intent information is transferred into other layers and translated into device-specific actions, and responses are collected (e.g., added to POINT data in packet 142) for use at the POINT sink device 120. At the sink device 120, the collected telemetry data for the packet-optical flow 105 (collected from POINT source device 110 to POINT sink device 120) is processed as needed by the intended applications. Examples of telemetry data processing may include triggering a report to a management entity (e.g., using mechanisms like OpenConfig) or archiving collected data in a storage device.

According to the disclosures herein, a sufficient degree of reliability of applications for networking and distributed systems, including telemetry, may be achieved by using solutions to the set difference problem for distributed applications that need to compare remote states. FIG. 2 is a high-level illustration of the set difference problem for two hosts (e.g., network devices). The set difference problem seeks to determine which data or objects are unique to host 201 (in this example, data B and E) and which data or objects are unique to host 202 (in this example, data C and D). By enhancing a telemetry application with a solution to the set difference problem, reliability may be provided because missing data from the source device may be identified by the destination device, or vice versa (i.e., data synchronization). Additionally, solutions to the set difference problem may identify duplicate data.

According to the disclosures herein, a possible solution to the set difference problem is the difference digest that allows two nodes (e.g., hosts, devices) to compute the elements belonging to the set difference in a single round with communication overhead proportional to the size of the difference times the logarithm of the keyspace. The difference digest may use invertible Bloom filters (IBFs), as described in the following. An IBF is a data structure that efficiently computes the set difference between two sets and uses the size of the difference. According to the difference digest, local object identifiers (IDs) (i.e., numbers that uniquely identify each object or element, where the object may be telemetry data such as intent or response data) at a host (e.g., objects A, B, E, F local to host 201, and objects A, C, D, F local to host 202 in FIG. 2) may be encoded into an IBF at the respective host.

An overview of an example algorithm for generating an IBF is described in the following. An empty IBF, prior to application of the hash functions, may consist of an array of in bits set to ‘0’, where in is proportional to the number of data elements (e.g., Object IDs for telemetry data) to be encoded. To populate the IBF, k hash functions may be used to map (or hash) each set element (e.g., an Object ID) to one of the in array positions to generate a uniform random distribution. A hash function is a function that maps data of an arbitrary size (e.g., multiple object IDs) to data of a fixed size (e.g., the hash value, hash code, or hash). Examples of types of hash functions that may be used for creating the IBF data structure may include, but are not limited to, MurmurHash functions or Fowler-Noll-Vo (FNV) hash functions.

In an example, the number of hash functions, k, may be a constant and may be less than the number of bits in the IBF, in. To add an element (e.g., Object ID) to the IBF, each of the k hash functions may be applied to the element to generate k array positions, and the bits associated with the k array positions may be set to T. To query for an element in the IBF to determine if the element is in the set, the k hash functions may be applied to the element to obtain k′ array positions. If any of the bits at the k′ array positions is set to ‘0’, then it can be determined that the queried element is not in the set. If all of the bits at the k′ array positions are set to ‘1’, then it can be determined that the queried element is in the set or the bits have been erroneously set to ‘1’ (e.g., due the insertion of other elements) resulting in a false positive. Although the IBF set difference algorithm is referred to herein, any other set difference algorithm may be used.

FIG. 3 is an overview of the procedure for creating an IBF data structure 300 at a host. For each object (e.g., objects with ID A, B, C, and which may represent telemetry data such as intent data or response data) stored on the host, one or more hash functions (e.g., hash functions H₃₀₁, H₃₀₂, H₃₀₃) may be applied to the object ID to generate a hash value, represented by H(objectID) (e.g., H₃₀₁(A)) and the hash sum (i.e., exclusive-OR (XOR)) may be taken of the hash function value.

The IBF data structure 300 may include an array of IBF cells 305 ₁ . . . 305 _(ad) each containing: idSum equal to the exclusive-OR (XOR) of all object IDs in the cell; H(objectID), which is the hash value for all object IDs in the cell; hashSum equal to the XOR of one or more hash values (i.e., hash functions applied to object IDs); and count equal to the number of object IDs assigned to the cell. Each object ID (e.g., A, B, C) may be hashed multiple times using different hash functions H₃₀₁, H₃₀₂, H₃₀₃ and assigned to different IBF cells 305 ₁ . . . 305 _(ad), and for a set difference of size d, ad IBF cells are used where a is an integer greater than 1. IBFs may be created in this manner at the two hosts, and the local IBF may then be traded with the remote host (e.g., host 201 trades it's local IBF with host 202 in FIG. 2) to calculate the set (IBF) difference.

FIG. 4 is an overview of the procedure for calculating the IBF difference. In this example, host 401 has provided its local IBF 411 to host 402, and host 402 takes the set difference (e.g., using the approach described above) between IBF structures 412 and 411 to produce a new IBF 415 containing only unique object IDs for the objects that are on host 401 and not on host 402.

According to the disclosures herein, IBFs, or any other set difference algorithm, may be used to provide reliability of intent and/or response data for in-band telemetry applications such as the INT or POINT frameworks. Examples of the disclosed reliability methods and systems are given in the following with reference to the POINT framework; however, any of the disclosed reliability mechanisms, alone or in combination, may be used in any telemetry application, including, but not limited to INT, POINT, and OpenConfig.

According to an example reliable telemetry solution for the intent in a POINT framework, intent information is included in the data path and forwarded along with the data at each networking element (e.g., packet device, optical device, etc.). If local action is warranted at a network element based on the intent, the networking element may copy the intent and execute an appropriate data collection action. At the point of intent origination (e.g. the POINT source device), which may be, for example, a layer boundary such as at the packet-optical layer boundary, once the intent is forwarded, a local destination IBF that is associated with the destination device of the intent (e.g., POINT sink device or node) is updated. The destination device may maintain a corresponding IBF. A different IBF may be maintained by the source device for each device that is a destination for locally-originating intent. Note that the source device and sink device may be intermediate devices, such as layer boundary devices, along a network path.

End-to-end reliability for the intent is achieved by the exchange of the IBFs periodically between source device and sink device of a network path, or between devices at layer boundaries along a network path. The source and sink devices (or layer boundary devices) that exchange respective IBFs may take the set difference between the two IBFs to determine if telemetry information (e.g., intent and/or response) is missing at the sink or destination device. For example, if it is determined by the sink device (or layer boundary device) by computing the IBF set difference that particular intent information is missing at the sink device (or a layer boundary device), then the sink device may send a notification message to the source device (i.e., the originating device of the intent) to request retransmission of the missing intent.

In an example, IBF computed for the intent at a POINT device (e.g., source device, sink device) may involve the POINT device computing a hash function (e.g., Murmur hash or FNV hash) over the following data (or a subset of the following data): the intent, the destination ID, and/or the source timestamp. The intent and the destination ID are used to distinguish from IBFs for other intent and destinations, and the source timestamp may be included to distinguish telemetry data requests over time. In an example, data freshness can be used to check if intent is fresh instead of the timestamp. In some cases, telemetry data requests (i.e., intent) may be repeated over time. Devices need to distinguish between two requests for the same data that are separated over a time frame (e.g., current temperature queried every hour) and repeated queries for the data within that time frame (e.g., current temperature queried multiple times within the same hour). According to an example, the request data (i.e., intent) may be timestamped and the timestamp may then be used to distinguish between the two scenarios. In another example, response data may be flagged as stale or expired based on freshness criteria or a freshness time threshold, such that response data for the intent may be discarded if the response data does not meet the freshness criteria or threshold (e.g., temperature readings collected more than one hour prior to receiving the intent information are discarded). In another example, every network element along the data path may compute a local destination IBF using the destination ID.

In an example, data integrity may be ensured by including digital signatures of querying entities along with the intent. For example, the source device may apply a digital signature to the intent information. This may be done using the elliptic curve digital signature algorithm or Rivest-Shamir-Adleman (RSA) algorithm. In another example, a device at a layer boundary (e.g., a POG), along with intent translation, may include the digital signature of the source device with the intent. In another example, an intermediate device along the network path may verify a digital signature included with the intent to authenticate the source and intent before generating a response to the intent.

FIG. 5 is a flow diagram of an example procedure 500 for reliable routing of intent performed by a network device that originates the intent in a POINT framework implemented in a communications network, in accordance with the disclosures herein. For example, the network device may be a packet device or an optical device. The network device may be source device or a layer boundary device (e.g., a POG). At 502, the network device may generate intent information for a particular destination device (e.g., sink device or layer boundary device). At 504, the network device may generate and maintain an IBF, IBF_(DestID), for the intent information destined for the destination device. The IBF may be computed by applying at least one hash function to the intent information and/or the destination ID of the destination device (and/or the source timestamp). Each time new intent information is generated, the network device may update the IBF_(DestID) associated with the destination device.

At 506, the network device may periodically forward the locally computed IBF_(DestID) to the destination device. For example, the network device may forward IBF_(DestID) in-band using the same channel as response communication or using an out-band channel such as management channel. The frequency with which the network device forwards IBF_(DestID) may be at set time intervals (e.g., at every one minute interval), or each time new intent information is generated, for example. In an example, the destination device may compute the set different between the received and locally generated IBF_(DestID)'s. In the case that intent is determined to be missing at the destination device, at 508, the network device may receive a notification message from the destination device indicating which intent information is missing.

In an example not shown in FIG. 5, the network device and the destination device may exchange their locally computed IBF_(DestID)'s. In other words, in addition to the network device periodically forwarding its locally computed IBF_(DestID) to the destination device, the network device may also periodically receive, from the destination, the IBF_(DestID) _(_) ₂ calculated at the destination device. In this case, the network device may compute the set difference between IBF_(DestID) and IBF_(DestID) _(_) ₂ to determine if any intent information is missing at the destination device.

With reference to FIG. 5, if intent information is determined to be missing (either by notification from the destination device or by local computation of the IBF set difference), then, at 510, the network device may retransmit the missing intent information to the destination device. In an example, intent caching at intermediate devices along the network path between the source device and destination device may be used to quickly retransmit the missing intent at the cost of intent storage at the intermediate device. The procedure 500 may be performed each time intent information is generated at the network device.

The procedure 500 may be performed by the network device for multiple destination devices in parallel, such that unique destination IBFs may be maintained for each respective destination device. Additionally, the procedure 500 may be performed by an intermediate device along the network path other than the source device that originated the intent information.

The network device may read the POINT intent from the data (packet or frame). If intent instructions apply to the network device, the network device may translate the intent into a suitable device-specific action for data collection, execute that action, and the response. The network device may forward the intent on the outbound interface along the network path. The network device may update the destination IBF associated with the destination IBF_(DestID).

As explained above, the destination device may also calculate a local IBF. The destination device may generate and maintain a local IBF, IBF_(DestID) _(_) ₂, for the intent information associated with its own destination ID and/or the source ID (to distinguish from IBFs from other source devices). The IBF_(DestID) _(_) ₂ may be computed by applying at least one hash function to the intent information and/or the destination ID at the destination device (and/or the source timestamp). Each time new intent information is received, the destination device may update the IBF_(DestID) _(_) ₂. The destination device may periodically receive from the source device (i.e., the device that originated the intent) the IBF_(DestID) that was computed at the source device. For example, the destination device may receive IBF_(DestID) in-band over the same channel as response communication or over an out-band channel such as management channel. The destination device may compute the set different between the received and locally generated IBF_(DestID)'s. In the case that intent is determined to be missing at the destination device, the destination device may send a notification message to the source device indicating which intent information is missing.

In the following, example reliable telemetry solutions for the response data in a POINT framework (or any in-band telemetry framework, such as INT) are described, in accordance with the disclosures herein. In the POINT framework, response data generated (i.e., in response to received intent) by a device along the network path may be forwarded hop-by-hop by intermediate devices along the network path to the destination (sink) device. In an example, response data may be stored and forwarded at each network device along the network path using the data forwarding mechanisms programmed/configured on those network devices. In an example, an intermediate network device may append its respective local response(s) to the data packet/frame carrying existing responses as the data packet/frame is forwarded along the network path. In another example, in some cases, the locally generated response data at a network device may be aggregated with the existing response data that is being carried in the data frame/packet.

In order to achieve hop-by-hop and end-to-end reliability for response data, in the case that local responses are appended to a data frame/packet as it traverses the network, a response IBF_(response) associated with the destination may be generated and updated along with the response data at each device along the network path. For example, the response data and IBF_(response) carried in the data frame/packet may have the following structure: {<Device ID₁, Response₁>, . . . <Device ID_(n), Response_(n)>, IBF_(response)}, where the Response_(i) was generated by the device with Device ID_(i), and IBF_(response) is based on the accumulated responses Response₁ . . . Response_(n) in the data frame/packet. In an example, any device that generates a response along the network path may also update the IBF_(response) by taking a hash function over <DeviceID, Response> for all device/response pairs in the data frame or packet including the locally generated response for the device itself. Then, the updated IBF_(response) may replace the previous IBF_(response) in the data frame/packet and thus be forwarded along the network path toward the destination device.

The destination device (e.g., sink device or layer boundary device) may locally compute/re-compute IBF_(response) _(_) ₂ based on the responses it has received and compare IBF_(response) _(_) ₂ with the IBF_(response) received in the data frame/packet in order to compute the set difference of the IBFs and determine which responses are missing at the destination device. For response data determined to be missing, the destination device may send a notification message to an intermediate device and/or the source device to request retransmission of the missing response data. For example, this may result in the missing response data being retransmitted, and/or the intent information associated with the missing response data being retransmitted. In an example, for high loss rates of response data, other set difference algorithms, such as difference digests that maintain a number of IBFs along with an estimate of the size of set difference, may be used instead of IBFs.

In order to achieve data integrity of response data, in the case that local responses are appended to a data frame/packet as it traverses the network (“append model”), cryptographic hash functions with IBFs and/or digital signatures may be used. For example, as response data traverses a network path from source device to the destination device, each intermediate device may update the bloom filter (e.g., IBF_(response)) and forward the response data and bloom filter along the network path, as described above. As described above, the sink device (or a layer boundary device), may verify the contents of the data and re-compute the bloom filter IBF_(response). If the filter values match up, then the response contents are deemed valid and forwarded. In an example, bloom filter calculation may use a cryptographic hash function SHA-1, such that the source device and sink device share keys with intermediate devices along the network path to use in the cryptographic hash function computation. Each device may update the bloom filter IBF_(response) using the shared key(s) and known hash functions.

In another example, the intermediate device that is the originator of response data and the destination device may exchange a keyed hash function value (HMac(K;m)) in addition to the IBF in order to authenticate and validate the data integrity of response data and the IBF. For example, let H() denote a cryptographic hash function, and HMAC() denote a keyed hash function for message authentication. To authenticate a message m (e.g., a frame or packet) with a secret symmetric key K, the keyed has function HMac(K;m) may be calculated according to the following equation:

HMac(K;m)=H((K⊕ipad)∥k H((K⊕ipad)k∥m))  Equation 1

where ∥ denotes concatenation, ⊕ denotes exclusive-OR (XOR), opad is the one-block-size outer padding (0x5c5c:::5c), and ipad is the one-block-size inner padding (0x3636:::36). In an example, the source device may calculate HMac(K;m) the data and shared keys, and include it in a packet header with the response data and IBF. The destination device may also locally HMac₂(K;m). The destination device may proceed to use a received IBF when the HMAC(K;m) received from the source device matches the locally computed HMac₂(K;m). If the locally computed HMac₂(K;m) does not match the HMAC(K;m) received from the source device, the destination device then it may be considered that the data has been tampered with and the destination device may choose to discard the received IBF and/or response data. The approaches described above for data integrity and authentication, including the use of cryptographic hash functions, digital signatures, and keyed hash functions, may be used in a similar fashion described for intent data and/or response data.

FIG. 6 is a flow diagram of an example procedure 600 for reliable routing of response data performed by a network device that generates response data in a POINT framework implemented in a communications network, in accordance with the disclosures herein. For example, the network device may be a packet device or an optical device. The network device may be any intermediate device that generates response data along the network path between the source device and the destination device. At 602, the network device may receive and read intent information in a received data packet or frame. At 604, the network device may process (e.g., translate) the intent and generate a response.

At 606, the network device may generate and maintain an IBF, IBF_(DestID), for the response information destined for the destination device. The IBF IBF_(DestID) may be computed by applying at least one hash function to the locally-generated response information and/or the destination ID of the destination device (and/or the local timestamp). Each time new response information is generated, the network device may update the IBF_(DestID) associated with the destination device. The hash function may be a cryptographic hash function, as described above.

At 608, the network device may periodically forward the locally computed IBF_(DestID) to the destination device. For example, the network device may forward IBF_(DestID) in-band using the same channel as response communication or using an out-band channel such as a management channel. The frequency with which the network device forwards IBF_(DestID) may be at set time intervals (e.g., every 15 minutes), or each time new intent information is generated, for example. In an example, the destination device may compute the set different between the received and locally generated IBF_(DestID). In the case that response data is determined to be missing at the destination device, at 610, the network device may receive a notification message from the destination device indicating which response information is missing.

In an example not shown in FIG. 6, the network device and the destination device may exchange their locally computed IBF_(DestID)'s. In other words, in addition to the network device periodically forwarding its locally computed IBF_(DestID) to the destination device, the network device may also periodically receive, from the destination, the IBF_(DestID) _(_) ₂ calculated at the destination device. In this case, the network device may compute the set difference between IBF_(DestID) and IBF_(DestID) _(_) ₂ to determine if any response information is missing at the destination device.

With reference to FIG. 6, if response information is determined to be missing (either by notification from the destination device or by local computation of the IBF set difference), then, at 612, the network device may retransmit the missing response information to the destination device. In an example, response caching at intermediate devices along the network path between the network device and destination device may be used to quickly retransmit the missing response data at the cost of intent storage at the intermediate device. The procedure 600 may be performed each time response information is generated at the network device, and the IBF_(DestID) may be updated each time new response data is locally generated for the associated destination device. The procedure 600 may be performed by the network device for multiple destination devices in parallel, such that unique destination IBFs may be maintained for each respective destination device.

As explained above, the destination device may also calculate a local IBF. The destination device may generate and maintain a local IBF, IBF_(DestID) _(_) ₂, for the response information associated with its own destination ID and/or the ID of the originating device of the response data. The IBF_(DestID) _(_) ₂ may be computed by applying at least one hash function to the response information and/or the destination ID at the destination device (and/or the source timestamp). Each time new response information is received, the destination device may update the IBF_(DestID) _(_) ₂. The destination device may periodically receive from the source device (i.e., the device that originated the response) the IBF_(DestID) that was computed at the source device. For example, the destination device may receive IBF_(DestID) in-band over the same channel as response communication or over an out-band channel such as management channel. The destination device may compute the set different between the received and locally generated IBF_(DestID)'s. In the case that response data is determined to be missing at the destination device, the destination device may send a notification message to the source device indicating which response information is missing.

In an example, the disclosed reliable telemetry methods and system, and any subset or one or more component(s) thereof, may be implemented using software and/or hardware and may be partially or fully implemented by computing devices, such as the computing device 700 shown in FIG. 7.

FIG. 7 is a block diagram of a computing system 700 in which one or more disclosed embodiments may be implemented. The computing system 700 may include, for example, a computer, a switch, a router, a gaming device, a handheld device, a set-top box, a television, a mobile phone, or a tablet computer. The computing device 700 may include a processor 702, a memory 704, a storage device 706, one or more input devices 708, and/or one or more output devices 710. The input devices 708 and output devices 710 may be generally referred to as interfaces for the computing device 700. The device 700 may include an input driver 712 and/or an output driver 714. The device 700 may include additional components not shown in FIG. 7.

The processor 702 may include a central processing unit (CPU), a graphics processing unit (GPU), a CPU and GPU located on the same die, or one or more processor cores, wherein each processor core may be a CPU or a GPU. The memory 704 may be located on the same die as the processor 702, or may be located separately from the processor 702. The memory 704 may include a volatile or non-volatile memory, for example, random access memory (RAM), dynamic RAM, or a cache.

The storage device 706 may include a fixed or removable storage, for example, a hard disk drive, a solid state drive, an optical disk, or a flash drive. The input devices 708 may include a keyboard, a keypad, a touch screen, a touch pad, a detector, a microphone, an accelerometer, a gyroscope, a biometric scanner, or a network connection (e.g., a wireless local area network card for transmission and/or reception of wireless IEEE 802 signals). The output devices 710 may include a display, a speaker, a printer, a haptic feedback device, one or more lights, an antenna, or a network connection (e.g., a wireless local area network card for transmission and/or reception of wireless IEEE 802 signals).

The input driver 712 may communicate with the processor 702 and the input devices 708, and may permit the processor 702 to receive input from the input devices 708. The output driver 714 may communicate with the processor 702 and the output devices 710, and may permit the processor 702 to send output to the output devices 710. The output driver 716 may include an accelerated processing device (“APD”) 716 which may be coupled to a display device 718. The APD may be configured to accept compute commands and graphics rendering commands from processor 702, to process those compute and graphics rendering commands, and to provide pixel output to display device 718 for display.

In an example, with reference to FIG. 1, the point source 110, packet devices 112 ad 118, optical devices 122-132, and/or POGs 114, may be implemented, at least in part, with the components of computing device 700 shown in FIG. 7. Similarly, the procedures 500 and 600 shown in FIGS. 5 and 6 may be implemented, at least in part, with the components of computing device 700 shown in FIG. 7.

It should be understood that many variations are possible based on the disclosure herein. Although features and elements are described above in particular combinations, each feature or element may be used alone without the other features and elements or in various combinations with or without other features and elements.

The methods and elements disclosed herein may be implemented in/as a general purpose computer, a processor, a processing device, or a processor core. Suitable processing devices include, by way of example, a general purpose processor, a special purpose processor, a conventional processor, a digital signal processor (DSP), a plurality of microprocessors, one or more microprocessors in association with a DSP core, a controller, a microcontroller, Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) circuits, any other type of integrated circuit (IC), and/or a state machine. Such processors may be manufactured by configuring a manufacturing process using the results of processed hardware description language (HDL) instructions and other intermediary data including netlists (such instructions capable of being stored on a computer readable media). The results of such processing may be maskworks that are then used in a semiconductor manufacturing process to manufacture a processor which implements aspects of the embodiments.

The methods, flow charts and elements disclosed herein may be implemented in a computer program, software, or firmware incorporated in a non-transitory computer-readable storage medium for execution by a general purpose computer or a processor. Examples of non-transitory computer-readable storage mediums include a read only memory (ROM), a random access memory (RAM), a register, cache memory, semiconductor memory devices, magnetic media such as internal hard disks and removable disks, magneto-optical media, and optical media such as CD-ROM disks, and digital versatile disks (DVDs). 

What is claimed is:
 1. A network device comprising: a processor coupled to at least one interface; the processor and the at least one interface configured to: generate intent information for a destination device, wherein the intent information indicates a type of telemetry data to be collected along a network path to the destination device; update a locally stored invertible Bloom function (IBF) by applying at least one hash function to at least the intent information and a destination identifier (ID) associated with the destination device; and periodically forward the locally stored IBF to the destination device.
 2. The network device of a claim 1, wherein the processor and the at least one interface are further configured to: forward the intent information to the destination device, wherein the locally stored IBF is forwarded with the intent information.
 3. The network device of claim 1, wherein the processor and the at least one interface are further configured to: receive a notification message that the intent information is missing at the destination device based at least in part on the locally stored IBF; and re-forward the intent information to the destination device.
 4. The network device of claim 1, wherein the processor and the at least one interface are further configured to: apply the at least one hash function to a local timestamp in addition to the intent information and the destination ID to update the locally stored IBF.
 5. The network device of claim 1, wherein the processor and the at least one interface are further configured to: periodically receive a destination IBF calculated at the destination device; and compute a set different between the locally stored IBF and the destination IBF to determine if the intent information is missing at the destination device.
 6. The network device of claim 1, wherein the processor and the at least one interface are further configured to: apply a digital signature to the intent information.
 7. The network device of claim 1, wherein the processor and the at least one interface are further configured to: apply a keyed hash function to at least a secret symmetric key and a packet or frame to generate a keyed hash function value; and forward the keyed hash function value with the locally stored IBF in the packet or frame to the destination device.
 8. The network device of claim 1, wherein the processor and the at least one interface are further configured to: generate second intent information for a second destination device, wherein the second intent information indicates a type of telemetry data to be collected along a second network path to the second destination device; update a second locally stored IBF by applying at least a second hash function to at least the second intent information and a destination identifier (ID) associated with the second destination device; and periodically forward the second locally stored IBF to the second destination device.
 9. A network device comprising: a processor coupled to at least one interface; the processor and the at least one interface configured to: receive a packet or frame including intent information, wherein the intent information indicates a type of telemetry data to be collected along a network path to a destination device; read and translate the intent information to generate a device-specific action; execute the device-specific action to generate a local response corresponding to the intent information; encode the local response; update a locally stored invertible Bloom function (IBF) by applying at least one hash function to at least the local response and a destination identifier (ID) associated with the destination device; and periodically forward the locally stored IBF to the destination device.
 10. The network device of a claim 9, wherein the processor and the at least one interface are further configured to: forward the encoded local response to the destination device, wherein the locally stored IBF is forwarded with the encoded local response.
 11. The network device of a claim 9, wherein the packet or frame further includes a plurality of responses generated by other devices along the network path, and wherein the processor and the at least one interface are further configured to: apply the least one hash function to the plurality of responses in addition to the local response and the destination ID to update the locally stored IBF; and append the local response to the plurality of responses carried un the packet or frame.
 12. The network device of claim 9, wherein the processor and the at least one interface are further configured to: receive a notification message that the response is missing at the destination device based at least in part on the locally stored IBF; and re-forward the local response to the destination device.
 13. The network device of claim 9, wherein the processor and the at least one interface are further configured to: receive a secret symmetric key; apply a keyed hash function to at least the secret symmetric key and the packet or frame to generate a keyed hash function value; and forward the keyed hash function value with the locally stored IBF to the destination device.
 14. The network device of claim 9, wherein the processor and the at least one interface are further configured to: periodically receive a destination IBF calculated at the destination device; and compute a set different between the locally stored IBF and the destination IBF to determine if the local response is missing at the destination device.
 15. The network device of claim 9, wherein the processor and the at least one interface are further configured to: receive a second packet or frame including second intent information, wherein the second intent information indicates a type of telemetry data to be collected along a second network path to a second destination device; read and translate the second intent information to generate a second device-specific action; execute the second device-specific action to generate a second local response corresponding to the second intent; encode the second local response; update a second locally stored IBF by applying at least a second hash function to at least the second local response and a destination ID associated with the second destination device; and periodically forward the second locally stored IBF to the second destination device.
 16. A method performed by a network device, the method comprising: generating intent information for a destination device, wherein the intent information indicates a type of telemetry data to be collected along a network path to the destination device; updating a locally stored invertible Bloom function (IBF) by applying at least one hash function to at least the intent information and a destination identifier (ID) associated with the destination device; and periodically forwarding the locally stored IBF to the destination device.
 17. The method of a claim 16, further comprising: forwarding the intent information to the destination device, wherein the locally stored IBF is forwarded with the intent information.
 18. The method of a claim 16, further comprising:: receiving a notification message that the intent information is missing at the destination device based at least in part on the locally stored IBF; and re-forwarding the intent information to the destination device.
 19. The method of a claim 16, further comprising: applying the at least one hash function to a local timestamp in addition to the intent information and the destination ID to update the locally stored IBF.
 20. The method of a claim 16, further comprising: periodically receiving a destination IBF calculated at the destination device; and computing a set different between the locally stored IBF and the destination IBF to determine if the intent information is missing at the destination device. 